Ferramenta que permite criação de código, visualização de resultados e documentação no mesmo documento (.ipynb)
Modo de comando: esc
para ativar, o cursor fica inativo
Modo de edição: enter
para ativar, modo de inserção
Para usar os atalhos descritos abaixo a célula deve estar selecionada porém não pode estar no modo de edição.
Para entrar do modo de comando: esc
Criar nova célula abaixo: b
(elow)
Criar nova célula acima: a
(bove)
Recortar uma célula: x
c
Colar uma cálula: v
Executar uma célula e permanecer nela mesma: ctrl + enter
Executar uma célula e mover para a próxima: shift + enter
Para ver todos os atalhos, tecle h
Code: Para código Python
Markdown: Para documentação
Também existem Raw NBConverter e Heading
osx-64
, linux-64
, linux-32
, win-64
, win-32
e Python 2.7
, Python 3.4
, e Python 3.5
conda install pandas
pip install pandas
osx-64
, linux-64
, linux-32
, win-64
, win-32
e Python 2.7
, Python 3.4
, e Python 3.5
conda install matplotlib
pip install matplotlib
In [1]:
import pandas as pd
import matplotlib
%matplotlib inline
In [2]:
%%time
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None, encoding='utf-8')
In [3]:
%%time
release_dates = pd.read_csv('data/release_dates.csv', index_col=None,
parse_dates=['date'], infer_datetime_format=True)
In [4]:
cast.columns
Out[4]:
In [5]:
titles = cast[['title', 'year']].drop_duplicates().reset_index(drop=True)
titles.head()
Out[5]:
df.head(n):
In [6]:
cast.head()
Out[6]:
In [7]:
release_dates.head()
Out[7]:
df.tail(n):
In [8]:
cast.tail()
Out[8]:
In [9]:
release_dates.tail()
Out[9]:
In [10]:
len(cast), len(release_dates)
Out[10]:
In [11]:
cast['type']
Out[11]:
In [12]:
cast.type.head()
Out[12]:
In [13]:
c = 'type'
cast[c].head() #cast.c.head() não vai funcionar!
Out[13]:
df[col].unique()
:
In [14]:
cast['type'].unique()
Out[14]:
In [15]:
cast['type'].value_counts()
Out[15]:
In [16]:
h = cast.head()
h
Out[16]:
In [17]:
h.year // 10 * 10 # Década
Out[17]:
In [18]:
h
Out[18]:
In [19]:
h.year > 2000
Out[19]:
In [20]:
cast[cast.character == 'Macduff Child']
Out[20]:
In [21]:
h[['title', 'year']]
Out[21]:
In [22]:
h[h.n.isnull()]
Out[22]:
In [23]:
h[h.n.notnull()]
Out[23]:
In [24]:
h
Out[24]:
In [25]:
h[[True, False, True, False, False]]
Out[25]:
In [26]:
h.year > 2000
Out[26]:
In [27]:
h[h.year > 2000]
Out[27]:
In [28]:
h[(h.year > 2000) & (h.year < 2016)] # & para 'and', | para 'or'
Out[28]:
Por DataFrame
In [29]:
h.fillna(0)
Out[29]:
Por coluna
In [30]:
h.n.fillna(0)
Out[30]:
In [31]:
cast.year.value_counts()#.head(10)
Out[31]:
In [32]:
cast.year.value_counts().plot()
Out[32]:
In [33]:
cast.year.value_counts().sort_index()#.head()
Out[33]:
In [34]:
cast.year.value_counts().sort_index().plot()
Out[34]:
In [35]:
bins = pd.np.arange(1880, 2040, 2)
cast.year.hist(bins=bins)
Out[35]:
In [36]:
g = cast.groupby([cast.year // 10 * 10, 'type']).size()
g
Out[36]:
In [37]:
u = g.unstack()
u
Out[37]:
In [38]:
a = u['actor'] - u['actress']
a
Out[38]:
In [39]:
a.plot()
Out[39]:
In [40]:
release_dates.head()
Out[40]:
In [41]:
release_dates.date.dt.year.head()
Out[41]:
In [42]:
release_dates.date.dt.dayofyear.head() # segunda=0, domingo=6
Out[42]:
In [43]:
len(release_dates[release_dates.date.dt.dayofweek == 4])*100/len(release_dates)
Out[43]:
In [44]:
cast.head()
Out[44]:
In [45]:
release_dates.head()
Out[45]:
In [46]:
c = cast[cast.name == 'Ellen Page']
c = c.merge(release_dates)
c.head()
Out[46]:
In [47]:
titles.sort_values('year').head(1)
Out[47]:
In [48]:
len(titles[titles.year == 1960])
Out[48]:
In [49]:
for y in range(1970, 1980):
print(y, (titles.year == y).sum())
In [50]:
titles[titles.year // 10 == 197].year.value_counts().sort_index()
Out[50]:
In [51]:
titles.groupby('year').size().loc[1970:1979]
Out[51]:
In [52]:
birth = 1990
len(titles[(titles.year >= birth) & (titles.year <= 2016)])
Out[52]:
In [53]:
titles[titles.year <= 1906][['title']]
Out[53]:
In [54]:
titles.year.min()
Out[54]:
In [55]:
titles.set_index('year').sort_index().loc[1894:1906]
Out[55]:
In [103]:
titles.title.value_counts().head(15)
Out[103]:
In [106]:
len(cast[cast.name == 'Judi Dench'])
Out[106]:
In [108]:
c = cast
c = c[c.name == 'Judi Dench']
c = c[c.n == 1]
c.sort_values('year')
Out[108]:
In [38]:
c =cast
c = c[c.name == 'Judi Dench']
c
Out[38]:
In [40]:
c = cast
c = c[c.title == 'Sleuth']
c = c[c.year == 1972]
c.sort_values('n')
Out[40]:
In [111]:
cast[cast.year == 1985].name.value_counts().head(10)
Out[111]:
In [ ]:
osx-64
, linux-64
, linux-32
, win-64
, win-32
e Python 2.7
, Python 3.4
, e Python 3.5
conda install scikit-learn
pip install -U scikit-learn
In [44]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
import pickle
import time
time1=time.strftime('%Y-%m-%d_%H-%M-%S')
In [45]:
iris = pd.DataFrame.from_csv('iris.csv', index_col=None, encoding='utf-8')
In [46]:
iris.columns
Out[46]:
In [47]:
target_data = iris['species']
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
feature_data = iris[features]
In [48]:
features_train, features_test, target_train, target_test = train_test_split(feature_data, target_data, test_size=0.33, random_state=42)
In [49]:
dt = DecisionTreeClassifier()
target_data = iris['species']
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
feature_data = iris[features]
dt = dt.fit(features_train, target_train)
In [50]:
with open('iris-dt_'+time1, 'bw') as f:
pickle.dump(dt, f)
In [51]:
with open('iris-dt_'+time1, 'br') as f:
pickle.load(f)
In [52]:
predictions = dt.predict(features_test)
confusion_matrix(target_test, predictions)
Out[52]:
In [ ]: